Methods for quantifying nucleic acid variations

ABSTRACT

Provided herein is technology relating to evaluating the state of nucleic acids and particularly, but not exclusively, to methods for measuring variations between DNAs, including differences in methylation and mutation.

FIELD OF INVENTION

Provided herein is technology relating to evaluating the state ofnucleic acids and particularly, but not exclusively, to methods formeasuring variations between DNAs, including differences in methylationand mutation.

BACKGROUND

DNA methylation is an epigenetic modification that regulates geneexpression and marks imprinted genes. Consequently, aberrant DNAmethylation is known to disrupt embryonic development and cell cycleregulation, and it can promote oncogenesis that produces cancers. Inmammals, methylation occurs only at cytosine residues and morespecifically only on a cytosine residue that is adjacent to a guanosineresidue (that is, at the sequence CG, often denoted “CpG”). Detectingand mapping sites of DNA methylation are essential steps forunderstanding epigenetic gene regulation and providing diagnostic toolsfor identifying cancers and other disease states associated with errorsin gene regulation.

Mapping methylation sites is currently accomplished by the bisulfitemethod described by Frommer, et al. for the detection of5-methylcytosines in DNA (Proc. Natl. Acad. Sci. USA 89: 1827-31 (1992),explicitly incorporated herein by reference in its entirety for allpurposes) or variations thereof. The bisulfite method of mapping5-methylcytosines is based on the observation that cytosine, but not5-methylcytosine, reacts with hydrogen sulfite ion (also known asbisulfite). The reaction is usually performed according to the followingsteps: first, cytosine reacts with hydrogen sulfite to form a sulfonatedcytosine. Next, spontaneous deamination of the sulfonated reactionintermediate results in a sulfonated uracil. Finally, the sulfonateduracil is desulfonated under alkaline conditions to form uracil.Detection is possible because uracil forms base pairs with adenine (thusbehaving like thymine), whereas 5-methylcytosine base pairs with guanine(thus behaving like cytosine). This makes the discrimination ofmethylated cytosines from non-methylated cytosines possible by, e.g.,bisulfite genomic sequencing (Grigg G, & Clark S, Bioessays (1994) 16:431-36; Grigg G, DNA Seq. (1996) 6: 189-98) or methylation-specific PCR(MSP) as is disclosed, e.g., in U.S. Pat. No. 5,786,146.

A gene's methylation state or mutation/polymorphism state is oftenexpressed as the fraction or percentage of individual strands of DNAthat are methylated/mutant at a particular site (e.g., at a singlenucleotide or at a longer sequence of interest, e.g., up to a ˜100-bpsubsequence of a DNA) relative to the total population of DNA in thesample comprising that particular site. For simplicity, the discussionbelow is directed to measuring methylation but it is equally applicableto the measurement of mutations and polymorphism in nucleic acidpopulations.

Traditionally, the amount of unmethylated (e.g., native) gene isdetermined by quantitative PCR (qPCR) using calibrators. Then, a knownamount of DNA is bisulfite treated and the resultingmethylation-specific sequence is determined using either a real-time PCRor an equivalent exponential amplification. In particular, conventionalmethods generally comprise generating a standard curve for theunmethylated target by using external standards. The standard curve isconstructed from at least two points and relates the real-time C_(p)value for unmethylated DNA to known quantitative standards. Then, asecond standard curve for the methylated target is constructed from atleast two points and external standards. This second standard curverelates the C_(p) for methylated DNA to known quantitative standards.Next, the test sample C_(p) values are determined for the methylated andunmethylated populations and the genomic equivalents of DNA arecalculated from the standard curves produced by the first two steps. Thepercentage of methylation at the site of interest is calculated from theamount of methylated DNAs relative to the total amount of DNAs in thepopulation, e.g., (number of methylated DNAs)/(the number of methylatedDNAs+number of unmethylated DNAs)×100.

Accordingly, these conventional methods require the construction ofstandard curves from several external standard PCRs and then requirecalculating a putative absolute number of methylated DNA sites orstrands in one portion of the test sample and a putative absolute numberof unmethylated sites or strands of DNA from another portion of the testsample. These methods require the user to assemble several reactionmixtures, which can be labor intensive and time-inefficient, and whichincreases the likelihood of error. In addition, the number of reactionsrequires a relatively large amount of DNA to provide enough template forall the necessary PCR mixtures, and thus is sample-inefficient.Furthermore, each of the numerous measurements has an associated errorthat is propagated in calculating the extent of methylation in the testsample. In particular, at least two standards are assembled and measuredto construct the methylated DNA standard curve, at least two standardsare assembled and measured to construct the unmethylated DNA standardcurve, and multiple aliquots of the test sample are assembled andmeasured. Additionally, well-to-well variation (e.g., amongst the wellsof a 96-well assay plate) between external standards and the test samplecan also introduce significant errors in the measurement. For instance,the typical calibration methods used for fluorescence real-time PCRthermocyclers can unpredictably produce well-to-well variations of 1C_(p) unit or more. As such, these variations in sample measurement as afunction of location on the assay plate can cause substantial errors forthe analysis of a test sample.

SUMMARY

The present technology provides methods and systems for determining thefractional amount of a nucleic acid target that is variant, e.g., ascompared to a reference or non-variant nucleic acid, expressed aspercent variant of said nucleic acid target. The nature of the variationis not limited to any particular type of variation. For example, in someembodiments, the variation may be a particular methylation status, e.g,the percentage of a nucleic acid that is methylated compared to theamount of the same target nucleic acid that is not methylated. In someembodiments, the technology relates to determining the percentage oftarget nucleic acid that contains a mutation or polymorphism orparticular allele, compared to the target nucleic acid that does notcontain the particular mutation or polymorphism or allele of interest.

In some embodiments, methods of the technology comprise the steps of:

-   -   a) providing quantitative amplification data from a sample        comprising nucleic acid target, wherein said nucleic acid target        comprises at least one copy of a non-variant form of said        nucleic acid target and/or at least one copy of a variant form        of said nucleic acid target, wherein said quantitative        amplification data correlates to amplification cycle numbers for        said non-variant and variant forms of said nucleic acid target        present in said sample;    -   b) determining a first crossing threshold indicative of variant        copies of said nucleic acid target (C_(p)v);    -   c) determining a second crossing threshold indicative of        non-variant copies of said nucleic acid target (C_(p)nv); and    -   d) calculating a ratio R of variant copies of said nucleic acid        target to non-variant copies of said nucleic acid target, and    -   e) calculating the percentage of nucleic acid target present in        said sample that is variant form.

In some embodiments, prior to calculating of the ratio R, the methodcomprises a step of determining a log copy number for the variant andnon-variant copies of the target nucleic acid. In certain preferredembodiments, the log copy number for variant copies of said targetnucleic acid is described by “log copy=(C_(p)v−C_(p)0)/−S” and the logcopy number for non-variant copies of the target nucleic acid isdescribed by “log copy=(C_(p)nv−C_(p)0)/−S,” wherein C_(p)0 is theintercept of the y axis when log copy number=0 and wherein S=the slopeof the linear regression of reaction efficiency. In some embodiments,the method further comprises calculating an offset C_(p)off between theregression lines of C_(p)y and C_(p)nv.

In certain preferred embodiments, the ratio R is calculated according tothe expression “ratio R=10^((Cpv−Cpnv+Cpoff)/−S)).” In still furtherpreferred embodiments, the percentage of nucleic acid target in thesample that is variant form is calculated as according to the expression“percent variant=100×(R/(R+1)).”

In some embodiments, the value for S is a value estimated from a rangefor use in the calculation of the log copy values. In some embodiments,the value used for S is greater than or equal to about 2.7, while insome embodiments, the value used for S is less than or equal to about3.3. In certain preferred embodiments, the value used for S is 3.

The quantitative amplification data is not limited to data from anyparticular type of reaction. It is contemplated that the technologyfinds use in, for example, ligase chain reactions, transcriptionmediated amplification, scorpion probe-based assays, etc. In someembodiments, the quantitative amplification data is from a quantitativePCR assay (including, e.g., reverse-transcription PCR detection of RNA),such as a real time quantitative polymerase chain reaction. In certainpreferred embodiments, the quantitative PCR assay is a PCR+INVADERassay. In particularly preferred embodiments, the PCR+INVADER assay is aQuARTS assay.

It is contemplated that the data is collected from any detectable signalor attribute of amplification products, e.g., fluorescence,luminescence, radiation, polarization, mass, etc. In certain preferredembodiments, the quantitative amplification data comprises fluorescencedata.

The technology of the invention finds application for the analysis ofany type of variation between two or more nucleic acids in a sample. Forexample, in some embodiments, the variation relates to methylation,e.g., variant nucleic acid is methylated DNA and said non-variantnucleic acid is unmethylated DNA, or vice versa. In other embodiments,the variant nucleic acid is nucleic acid containing a mutation,polymorphism, or other sequence-based difference, and the non-variantnucleic acid is wild type nucleic acid, or any nucleic acid to which thevariant is to be compared such that the relative fractions representedby the variants and non-variants constitute 100% of the copies ofinterest. For example, the technology also finds use in detectingrelative amounts of two or more mutations in a sample.

The technology may also be applied using a reference gene that isunrelated to the target nucleic acid, e.g., a gene that is present inthe sample in copy numbers that are the same as the copy numbers of thetarget of being measured. For example, the portion of a single copy genethat is methylated in a sample may be quantified by comparison toanother single copy gene. Thus, in some embodiments, the technologyprovides method to determine the fractional amount of a nucleic acidtarget that is variant, comprising the steps of:

-   -   a) providing a sample comprising:        -   i) a population of copies of the nucleic acid target,            wherein the population of copies of the nucleic acid target            comprises at least one non-variant copy of the nucleic acid            target, and/or at least one variant copy of the nucleic acid            target, and        -   ii) a population of copies of a reference nucleic acid            target, wherein said population of copies of the reference            nucleic acid target in said sample contains approximately            the same number of copies as said population of copies of            the nucleic acid target;    -   b) treating the test sample with a quantitative amplification        assay to produce quantitative amplification data from the        sample, wherein the quantitative amplification data correlates        to amplification cycle number for said variant form of said        nucleic acid target and said reference nucleic acid target;    -   c) determining a first crossing threshold indicative of copies        of said variant form of said nucleic acid target (C_(p)v);    -   d) determining a second crossing threshold indicative of copies        of said reference nucleic acid target (C_(p)r); and    -   e) calculating a ratio R of copies of said variant form of said        nucleic acid target to copies of said reference target nucleic        acid, and    -   f) calculating the percent variant of said target nucleic acid        present in said sample.

As with the method discussed above, in some embodiments, prior tocalculating the ratio R, said method comprises a step of determining alog copy number for copies of the variant nucleic acid target and forcopies of the reference nucleic acid target. In certain preferredembodiments, the log copy number for copies of the variant form of thenucleic acid target is described by the expression “logcopy=(C_(p)v−C_(p)0)/−S,” and the log copy number for the referencenucleic acid target is described by “log copy=(C_(p)r−C_(p)0)/−S,”wherein C_(p)0 is the intercept of the y axis when log copy number=0,and wherein S=the slope of the linear regression of reaction efficiency.In some embodiments, the method further comprises calculating an offsetC_(p)off between the regression lines of C_(p)v and C_(p)r.

In some embodiments, the ratio R is calculated according to theexpression ratio R=10^((Cpv−Cpr+Cpoff)/−S)). In certain preferredembodiments, the percentage of the nucleic acid target that is variantform is calculated according to the expression “percent variant=100×R.”

In some embodiments, the reference nucleic acid target comprises anucleic acid encoding at least a portion of a housekeeping protein, andin certain preferred embodiments, the housekeeping protein is an actinpolypeptide, e.g., α-actin. As discussed above, the technology of theinvention is not limited to any particular variation in its application.Variations of nucleic acid targets include methylation, mutation,deletion, polymorphism, alleleic differences, etc.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects, and advantages of the presenttechnology will become better understood with regard to the followingdrawings:

FIG. 1 shows threshold crossing point (Cp) as measured using a series ofcontrol samples having the indicated number of strands of vimentintarget nucleic acid. Methylated and non-methylated target strands weremeasured together in the same assay and in the same reaction vessel. Onecopy of double-stranded DNA is counted as two strands. FIG. 1A shows themeasured Cps for methylated vimentin strands. FIG. 1B shows the measuredCps for non-methylated vimentin strands.

FIG. 2 shows the measured Cps for methylated (Me) and non-methylated(Non Me) vimentin DNA from samples numbered 33-80. The numbers of Me andNon-Me strands are calculated by reference to the standards shown inFIG. 1.

FIG. 3 shows the percentage of methylated vimentin DNA calculated fromthe strand counts in FIG. 2 for each of samples 33-80, compared to thepercentage of methylation in the same set of samples predicted bycomparing the Cp values for each sample using the methods of theinvention. The formulae for calculating the percentage methylation fromthe values measured for methylated strands (Strm) and non-methylatedstrands (Strnm) in FIG. 2, and for calculating a Cp ratio R andpredicting a % methylation value from R for each sample are shown in theheader of the table.

FIG. 4 shows a graph comparing the percent methylation calculated foreach sample by comparison to external standards (“measured” %methylation) to the percent methylation predicted for each sample byanalysis of Cpm and Cpnm values measured for the same sample (predicted% methylation).

FIG. 5 shows a graph comparing the percent methylation calculated forvimentin DNAs in each sample by comparison to external vimentinstandards (“calculated % methylation”) to the percent methylationpredicted by analysis of Cp values measured for actin and methylatedvimentin the same sample (“predicted % methylation”).

DETAILED DESCRIPTION Definitions

To facilitate an understanding of the present technology, a number ofterms and phrases are defined below. Additional definitions are setforth throughout the detailed description.

Throughout the specification and claims, the following terms take themeanings explicitly associated herein, unless the context clearlydictates otherwise. The phrase “in one embodiment” as used herein doesnot necessarily refer to the same embodiment, though it may.Furthermore, the phrase “in another embodiment” as used herein does notnecessarily refer to a different embodiment, although it may. Thus, asdescribed below, various embodiments of the invention may be readilycombined, without departing from the scope or spirit of the invention.

In addition, as used herein, the term “or” is an inclusive “or” operatorand is equivalent to the term “and/or” unless the context clearlydictates otherwise. The term “based on” is not exclusive and allows forbeing based on additional factors not described, unless the contextclearly dictates otherwise. In addition, throughout the specification,the meaning of “a”, “an”, and “the” include plural references. Themeaning of “in” includes “in” and “on.”

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimit of that range and any other stated or intervening value in thatstated range, is encompassed within the invention. The upper and lowerlimits of these smaller ranges may independently be included in thesmaller ranges and are also encompassed within the invention, subject toany specifically excluded limit in the stated range. Where the statedrange includes one or both of the limits, ranges excluding either orboth of those included limits are also included in the invention.

As used herein, the terms “subject” and “patient” refer to any animal,such as a dog, cat, bird, livestock, and particularly a mammal,preferably a human. In some instances, the subject is also a “user” (andthus the user is also the subject or patient).

As used herein, the term “sample” and “specimen” are usedinterchangeably, and in the broadest senses. In one sense, sample ismeant to include a specimen or culture obtained from any source, as wellas biological and environmental samples. Biological samples may beobtained from animals (including humans) and encompass fluids, solids,tissues, and gases. Biological samples include blood products, such asplasma, serum, stool, urine, and the like. Environmental samples includeenvironmental material such as surface matter, soil, mud, sludge,biofilms, water, crystals, and industrial samples. Such examples are nothowever to be construed as limiting the sample types applicable to thepresent invention.

The term “target,” when used in reference to a nucleic acid capture,detection, or analysis method, generally refers to a nucleic acid havinga feature, e.g., a particular sequence of nucleotides to be detected oranalyzed, e.g., in a sample suspected of containing the target nucleicacid. In some embodiments, a target is a nucleic acid having aparticular sequence for which it is desirable to determine a methylationstatus. When used in reference to the polymerase chain reaction,“target” generally refers to the region of nucleic acid bounded by theprimers used for polymerase chain reaction. Thus, the “target” is soughtto be sorted out from other nucleic acid sequences that may be presentin a sample. A “segment” is defined as a region of nucleic acid withinthe target sequence. The term “sample template” refers to nucleic acidoriginating from a sample that is analyzed for the presence of a target.

As used herein, the term “locus” refers to a particular position, e.g.,of a mutation, polymorphism, or a C residue in a CpG dinucleotide,within a defined region or segment of nucleic acid, such as a gene orany other characterized sequence on a chromosome or RNA molecule. Alocus is not limited to any particular size or length, and may refer toa portion of a chromosome, a gene, functional genetic element, or asingle nucleotide or basepair. As used herein in reference to CpG sitesthat may be methylated, a locus refers to the C residue in the CpGdinucleotide.

The term “amplifying” or “amplification” in the context of nucleic acidsrefers to the production of multiple copies of a polynucleotide, or aportion of the polynucleotide, typically starting from a small amount ofthe polynucleotide (e.g., a single polynucleotide molecule), where theamplification products or amplicons are generally detectable.Amplification of polynucleotides encompasses a variety of chemical andenzymatic processes. The generation of multiple DNA copies from one or afew copies of a target or template DNA molecule during a polymerasechain reaction (PCR) or a ligase chain reaction (LCR; see, e.g., U.S.Pat. No. 5,494,810; herein incorporated by reference in its entirety)are forms of amplification. Additional types of amplification include,but are not limited to, allele-specific PCR (see, e.g., U.S. Pat. No.5,639,611; herein incorporated by reference in its entirety), assemblyPCR (see, e.g., U.S. Pat. No. 5,965,408; herein incorporated byreference in its entirety), helicase-dependent amplification (see, e.g.,U.S. Pat. No. 7,662,594; herein incorporated by reference in itsentirety), hot-start PCR (see, e.g., U.S. Pat. Nos. 5,773,258 and5,338,671; each herein incorporated by reference in their entireties),intersequence-specfic PCR, inverse PCR (see, e.g., Triglia, et alet al.(1988) Nucleic Acids Res., 16:8186; herein incorporated by reference inits entirety), ligation-mediated PCR (see, e.g., Guilfoyle, R. et aletal., Nucleic Acids Research, 25:1854-1858 (1997); U.S. Pat. No.5,508,169; each of which are herein incorporated by reference in theirentireties), methylation-specific PCR (see, e.g., Herman, et al., (1996)PNAS 93(13) 9821-9826; herein incorporated by reference in itsentirety), miniprimer PCR, multiplex ligation-dependent probeamplification (see, e.g., Schouten, et al., (2002) Nucleic AcidsResearch 30(12): e57; herein incorporated by reference in its entirety),multiplex PCR (see, e.g., Chamberlain, et al., (1988) Nucleic AcidsResearch 16(23) 11141-11156; Ballabio, et al., (1990) Human Genetics84(6) 571-573; Hayden, et al., (2008) BMC Genetics 9:80; each of whichare herein incorporated by reference in their entireties), nested PCR,overlap-extension PCR (see, e.g., Higuchi, et al., (1988) Nucleic AcidsResearch 16(15) 7351-7367; herein incorporated by reference in itsentirety), real time PCR (see, e.g., Higuchi, et al., (1992)Biotechnology 10:413-417; Higuchi, et al., (1993) Biotechnology11:1026-1030; each of which are herein incorporated by reference intheir entireties), reverse transcription PCR (see, e.g., Bustin, S. A.(2000) J. Molecular Endocrinology 25:169-193; herein incorporated byreference in its entirety), solid phase PCR, thermal asymmetricinterlaced PCR, and Touchdown PCR (see, e.g., Don, et al., Nucleic AcidsResearch (1991) 19(14) 4008; Roux, K. (1994) Biotechniques 16(5)812-814; Hecker, et al., (1996) Biotechniques 20(3) 478-485; each ofwhich are herein incorporated by reference in their entireties).Polynucleotide amplification also can be accomplished using digital PCR(see, e.g., Kalinina, et al., Nucleic Acids Research. 25; 1999-2004,(1997); Vogelstein and Kinzler, Proc Natl Acad Sci USA. 96; 9236-41,(1999); International Patent Publication No. WO05023091A2; US PatentApplication Publication No. 20070202525; each of which are incorporatedherein by reference in their entireties).

The term “polymerase chain reaction” (“PCR”) refers to the method of K.B. Mullis U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, thatdescribe a method for increasing the concentration of a segment of atarget sequence in a mixture of genomic or other DNA or RNA, withoutcloning or purification. This process for amplifying the target sequenceconsists of introducing a large excess of two oligonucleotide primers tothe DNA mixture containing the desired target sequence, followed by aprecise sequence of thermal cycling in the presence of a DNA polymerase.The two primers are complementary to their respective strands of thedouble stranded target sequence. To effect amplification, the mixture isdenatured and the primers then annealed to their complementary sequenceswithin the target molecule. Following annealing, the primers areextended with a polymerase so as to form a new pair of complementarystrands. The steps of denaturation, primer annealing, and polymeraseextension can be repeated many times (i.e., denaturation, annealing andextension constitute one “cycle”; there can be numerous “cycles”) toobtain a high concentration of an amplified segment of the desiredtarget sequence. The length of the amplified segment of the desiredtarget sequence is determined by the relative positions of the primerswith respect to each other, and therefore, this length is a controllableparameter. By virtue of the repeating aspect of the process, the methodis referred to as the “polymerase chain reaction” (“PCR”). Because thedesired amplified segments of the target sequence become the predominantsequences (in terms of concentration) in the mixture, they are said tobe “PCR amplified” and are “PCR products” or “amplicons.” Those of skillin the art will understand the term “PCR” encompasses many variants ofthe originally described method using, e.g., real time PCR, nested PCR,reverse transcription PCR (RT-PCR), single primer and arbitrarily primedPCR, etc.

In some embodiments, target nucleic acid is amplified (e.g., by PCR) andamplified nucleic acid is detected simultaneously using an invasivecleavage assay. Assays configured for performing a detection assay(e.g., invasive cleavage assay) in combination with an amplificationassay are described in US Patent Publication US 20090253142 A1(application Ser. No. 12/404,240), incorporated herein by reference inits entirety for all purposes.

Additional amplification plus invasive cleavage detectionconfigurations, termed the QuARTS method, are described in U.S. patentapplication Ser. No. 12/946,737 (now U.S. Pat. No. 8,361,720); Ser. No.12/946,745 (now U.S. Pat. No. 8,916,344); and Ser. No. 12/946,752 (nowU.S. Pat. No. 8,361,720), incorporated herein by reference in theirentireties for all purposes. In performing the QuARTS method, thereaction mixture is generally subjected to the following thermocyclingconditions:

1. a first set of 5 to 15 (e.g., 8 to 12) cycles of:

i) a first temperature of at least 90° C.;

ii) a second temperature in the range of 60° C. to 75° C. (e.g., 65° C.to 75° C.);

iii) a third temperature in the range of 65° C. to 75° C.;

followed by:

a second set of 20-50 cycles of:

i) a fourth temperature of at least 90° C.;

ii) a fifth temperature that is at least 10° C. lower than the secondtemperature (e.g., in the range of 50° C. to 55° C.; and

iii) a sixth temperature in the range of 65° C. to 75° C.

No additional reagents need to be added to the reaction mixture duringthe thermocycling, e.g., between the first and second sets of cycles. Inparticular embodiments, the thermostable polymerase is not inactivatedbetween the first and second sets of conditions, thereby allowing thetarget to be amplified during each cycle of the second set of cycles. Inparticular embodiments, the second and third temperatures are the sametemperature such that “two step” thermocycling conditions are performed.Each of the cycles may be independently of a duration in the range of 10seconds to 3 minutes, although durations outside of this range arereadily employed.

In each cycle of the second set of cycles (e.g., while the reaction isin the fifth temperature), a signal generated by cleavage of the flapprobe may be measured to provide a real-time measurement of the amountof target nucleic acid in the sample (where the term “real-time” isintended to refer to a measurement that is taken as the reactionprogresses and products accumulate). The measurement may be expressed asan absolute number of copies or a relative amount when normalized to acontrol nucleic acid in the sample.

Without being bound to any specific theory, it is believed that thehigher reaction temperatures in the first set of cycles may allow thetarget nucleic acid to be efficiently amplified by the pair of PCRprimers without significant interference by any of the flap assayreagents or their reaction products. The lower reaction temperature usedin the second set of cycles (i.e., the fifth temperature) is not optimumfor the polymerase used for PCR, but allows the flap oligonucleotide toefficiently hybridize to the target nucleic acid and is closer to theoptimum temperature of the flap endonuclease. The lower reactiontemperature used in the second set of cycles also facilitates subsequenthybridization of the cleaved flap to the FRET cassette. Thus, at a lowertemperature, the target nucleic acid may be detected without significantinterference from the PCR reagents.

The term “real time” as used herein in reference to detection of nucleicacid amplification or signal amplification refers to the detection ormeasurement of the accumulation of products or signal in the reactionwhile the reaction is in progress, e.g., during incubation or thermalcycling. Such detection or measurement may occur continuously, or it mayoccur at a plurality of discrete points during the progress of theamplification reaction, or it may be a combination. For example, in apolymerase chain reaction, detection (e.g., of fluorescence) may occurcontinuously during all or part of thermal cycling, or it may occurtransiently, at one or more points during one or more cycles. In someembodiments, real time detection of PCR or QuARTS reactions isaccomplished by determining a level of fluorescence at the same point(e.g., a time point in the cycle, or temperature step in the cycle) ineach of a plurality of cycles, or in every cycle. Real time detection ofamplification may also be referred to as detection “during” theamplification reaction.

As used herein, the term “quantitative amplification data set” refers tothe data obtained during quantitative amplification of the targetsample, e.g., target DNA. In the case of quantitative PCR or QuARTSassays, the quantitative amplification data set is a collection offluorescence values obtained at during amplification, e.g., during aplurality of, or all of the thermal cycles. Data for quantitativeamplification is not limited to data collected at any particular pointin a reaction, and fluorescence may be measured at a discrete point ineach cycle or continuously throughout each cycle.

The term “invasive cleavage structure” as used herein refers to acleavage structure comprising i) a target nucleic acid, ii) an upstreamnucleic acid (e.g., an INVADER oligonucleotide), and iii) a downstreamnucleic acid (e.g., a probe), where the upstream and downstream nucleicacids anneal to contiguous regions of the target nucleic acid, and wherean overlap forms between the a 3′ portion of the upstream nucleic acidand duplex formed between the downstream nucleic acid and the targetnucleic acid. An overlap occurs where one or more bases from theupstream and downstream nucleic acids occupy the same position withrespect to a target nucleic acid base, whether or not the overlappingbase(s) of the upstream nucleic acid are complementary with the targetnucleic acid, and whether or not those bases are natural bases ornon-natural bases. In some embodiments, the 3′ portion of the upstreamnucleic acid that overlaps with the downstream duplex is a non-basechemical moiety such as an aromatic ring structure, e.g., as disclosed,for example, in U.S. Pat. No. 6,090,543, incorporated herein byreference in its entirety. In some embodiments, one or more of thenucleic acids may be attached to each other, e.g., through a covalentlinkage such as nucleic acid stem-loop, or through a non-nucleic acidchemical linkage (e.g., a multi-carbon chain).

As used herein, the term “nucleic acid molecule” refers to any nucleicacid containing molecule, including but not limited to, DNA or RNA. Theterm encompasses sequences that include any of the known base analogs ofDNA and RNA including, but not limited to, 4 acetylcytosine,8-hydroxy-N6-methyladenosine, aziridinylcytosine, pseudoisocytosine,5-(carboxyhydroxyl-methyl) uracil, 5-fluorouracil, 5-bromouracil,5-carboxymethylaminomethyl-2-thiouracil,5-carboxymethyl-aminomethyluracil, dihydrouracil, inosine,N6-isopentenyladenine, 1-methyladenine, 1-methylpseudo-uracil,1-methylguanine, 1-methylinosine, 2,2-dimethyl-guanine, 2-methyladenine,2-methylguanine, 3-methyl-cytosine, 5-methylcytosine, N6-methyladenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxy-amino-methyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarbonylmethyluracil, 5-methoxyuracil,2-methylthio-N-isopentenyladenine, uracil-5-oxyacetic acid methylester,uracil-5-oxyacetic acid, oxybutoxosine, pseudouracil, queosine,2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil,5-methyluracil, N-uracil-5-oxyacetic acid methylester,uracil-5-oxyacetic acid, pseudouracil, queosine, 2-thiocytosine, and2,6-diaminopurine.

The term “wild-type” refers to a gene or gene product that has thecharacteristics of that gene or gene product when isolated from anaturally occurring source. A wild-type gene is that which is mostfrequently observed in a population and is thus arbitrarily designed the“normal” or “wild-type” form of the gene. In contrast, the terms“modified,” “mutant,” and “variant” refer to a gene or gene product thatdisplays modifications in sequence and or functional properties (i.e.,altered characteristics) when compared to the wild-type gene or geneproduct. It is noted that naturally occurring mutants can be isolated;these are identified by the fact that they have altered characteristicswhen compared to the wild-type gene or gene product.

As used herein, the term “gene” refers to a nucleic acid (e.g., DNA)sequence that comprises coding sequences necessary for the production ofa polypeptide, precursor, or RNA (e.g., rRNA, tRNA). A polypeptide canbe encoded by a full length coding sequence or by any portion of thecoding sequence so long as the desired activity or functional properties(e.g., enzymatic activity, ligand binding, signal transduction,immunogenicity, etc.) of the full-length or fragment polypeptide areretained. The term also encompasses the coding region of a structuralgene and the sequences located adjacent to the coding region on both the5′ and 3′ ends for a distance of about 1 kb or more on either end suchthat the gene corresponds to the length of the full-length mRNA.Sequences located 5′ of the coding region and present on the mRNA arereferred to as 5′ non-translated sequences. Sequences located 3′ ordownstream of the coding region and present on the mRNA are referred toas 3′ non-translated sequences. The term “gene” encompasses both cDNAand genomic forms of a gene. A genomic form or clone of a gene containsthe coding region interrupted with non-coding sequences termed “introns”or “intervening regions” or “intervening sequences.” Introns aresegments of a gene that are transcribed into nuclear RNA (e.g., hnRNA);introns may contain regulatory elements (e.g., enhancers). Introns areremoved or “spliced out” from the nuclear or primary transcript; intronstherefore are absent in the messenger RNA (mRNA) transcript. The mRNAfunctions during translation to specify the sequence or order of aminoacids in a nascent polypeptide.

In addition to containing introns, genomic forms of a gene may alsoinclude sequences located on both the 5′ and 3′ end of the sequencesthat are present on the RNA transcript. These sequences are referred toas “flanking” sequences or regions (these flanking sequences are located5′ or 3′ to the non-translated sequences present on the mRNAtranscript). The 5′ flanking region may contain regulatory sequencessuch as promoters and enhancers that control or influence thetranscription of the gene. The 3′ flanking region may contain sequencesthat direct the termination of transcription, post-transcriptionalcleavage and polyadenylation.

The abbreviations “Ct” and “Cp” as used herein in reference to datacollected during real time PCR and PCR+INVADER assays refer to the cycleat which signal (e.g., fluorescent signal) crosses a predeterminedthreshold value indicative of positive signal. Various methods have beenused to calculate the threshold that is used as a determinant of signalverses concentration, and the value is generally expressed as either the“crossing threshold” (Ct) or the “crossing point” (Cp). Either Cp valuesor Ct values may be used in embodiments of the methods presented hereinfor analysis of real-time signal for the determination of the percentageof variant and/or non-variant constituents in an assay or sample.

As used herein, the term “kit” refers to any delivery system fordelivering materials. In the context of nucleic acid purificationsystems and reaction assays, such delivery systems include systems thatallow for the storage, transport, or delivery of reagents and devices(e.g., inhibitor adsorbents, particles, denaturants, oligonucleotides,spin filters etc. in the appropriate containers) and/or supportingmaterials (e.g., buffers, written instructions for performing aprocedure, etc.) from one location to another. For example, kits includeone or more enclosures (e.g., boxes) containing the relevant reactionreagents and/or supporting materials. As used herein, the term“fragmented kit” refers to a delivery system comprising two or moreseparate containers that each contains a subportion of the total kitcomponents. The containers may be delivered to the intended recipienttogether or separately. For example, a first container may contain anmaterials for sample collection and a buffer, while a second containercontains capture oligonucleotides and denaturant. The term “fragmentedkit” is intended to encompass kits containing Analyte specific reagents(ASR's) regulated under section 520(e) of the Federal Food, Drug, andCosmetic Act, but are not limited thereto. Indeed, any delivery systemcomprising two or more separate containers that each contains asubportion of the total kit components are included in the term“fragmented kit.” In contrast, a “combined kit” refers to a deliverysystem containing all of the components of a reaction assay in a singlecontainer (e.g., in a single box housing each of the desiredcomponents). The term “kit” includes both fragmented and combined kits.

The term “system” as used herein refers to a collection of articles foruse for a particular purpose. In some embodiments, the articles compriseinstructions for use, as information supplied on e.g., an article, onpaper, or on recordable media (e.g., diskette, CD, flash drive, etc.).In some embodiments, instructions direct a user to an online location,e.g., a website.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can also beused in the practice or testing of the present invention, the preferredmethods and materials are now described.

DETAILED DESCRIPTION OF THE INVENTION

Before the present invention is described in greater detail, it is to beunderstood that this invention is not limited to particular embodimentsdescribed, as such may, of course, vary. It is also to be understoodthat the terminology used herein is for the purpose of describingparticular embodiments only, and is not intended to be limiting, sincethe scope of the present invention will be limited only by the appendedclaims. For example, while the invention is discussed in terms ofmeasuring methylation of DNA, it is to be understood that the methodsand systems discussed in this Description and in the Summary of theInvention also encompass methods for measuring a percentage of mutationor polymorphism in a nucleic acid target.

Provided herein is a multiplexed, quantitative assay for determining apercent methylation or other variation (e.g., mutation, polymorphism,deletion) of a target nucleic acid without reference to externalstandards, curves or controls. In certain embodiments, the assaycomprises a combined amplification reaction (e.g., polymerase chainreaction) with a signal-generating system (e.g., INVADER assay invasivecleavage reaction). In certain embodiments, a mixture of target nucleicacids, e.g., bisulfite-treated non-methylated and methylated DNAs, aretreated using PCR+INVADER assays configured to quantitatively emitsignal from different labels. For example, in some embodiments,bisulfite-treated non-methylated DNA is detected using a first dye(e.g., FAM dye), and bisulfite-treated methylated DNA is detected usinga second dye (e.g., CAL RED dye), with both assays conducted in the samevessel, at the same time. Although the two assays amplify simultaneouslyand in the same reaction well or vessel, they are detected in differentchannels. The concentration of nucleic acid target going into thereaction may vary, as may the percent methylation or variation. In someembodiments, percent methylation of a target nucleic acid correlateswith clinical presence of cancer, advanced adenoma. Normal samples havelow to absent methylation and adenoma and cancers can be greater than90% methylated.

As noted above, real time thermal cycling detection reactions, such asreal time qPCR monitor signal as a function of the number of thermalcycles. Typically, one measure of the amount of target nucleic acid in asample is the cycle at which signal (e.g., fluorescent signal) crosses apredetermined threshold value indicative of signal that is abovebackground noise. Various methods have been used to calculate thethreshold that is used as a determinant of signal that is abovebackground and is thus indicative of target concentration, and the valueis generally expressed as either the crossing threshold (Ct) or thecrossing point (Cp). The particular signal level set as the threshold isinfluenced by the particular chemistry of a reaction and theinstrumentation used to measure the real-time signal, and is generallyset just above the baseline signal (noise) measured in early cycles,before significant target or signal amplification has occurred. In someembodiments, a Ct or Cp is set as a percentage of the maximum signal,e.g., a percentage of the the highest level of fluorescence measured ina calibrator or control measured during the same experiment (e.g., in awell on the same plate in a thermalcycling instrument).

Because the signal and crossing point vary from assay to assay, whenusing prior methods, a standard curve is typically run at the same time,and unknown samples are interpolated from the curve. The standard curveis typically from a dilution series of target nucleic acid at knownconcentrations. While the precision of the Cps run in the same assay(e.g., in different wells, but on the same reaction plate) can be veryhigh, real time PCR can be imprecise (>25% CV) run-to-run, even with astandard curve.

In embodiments of the present invention, the ratio of different speciesof DNA in a sample (e.g., methylated to non-methylated; mutant to wildtype; etc.) is determined without reference to external standards orcontrols (i.e., without reference to controls that are not within thesame reaction tube, well or other vessel containing the reactionmixture).

In the following discussion, the crossing point indicative of realsignal is referred to as “Cp” but the discussion is equally applicableto analysis of threshold cycles (“Ct”) and other measures of the cycleat which a positive signal is detected in a real-time detection assay.

Since the relationship of Cp to concentration is a linear logrelationship and there is roughly a doubling of target at each cycle,the log copy number (or concentration) is described byLog copy=(C _(p) x−C _(p)0)/−3.3where C_(p)x is the crossing point of the unknown and C_(p)0 is theintercept of the y axis where log copy number=0. 3.3 is the slope of thelinear regression for a PCR having 100% efficiency (i.e., perfectdoubling).

Typically, however, the measured slope varies based on the actualdoubling efficiency of the assay. Further, for the PCR+INVADER assay,the PCR amplification is coupled to signal amplification from theINVADER invasive cleavage assay. The slope for PCR+INVADER assays, suchas the QuARTS assay, the slope can be less than 2.7. If the slope in therange of about 2.7 to 3.3, then using 3 as a value for slope value willbe acceptable and will have little effect on the result.

For multiplexed PCR and PCR+INVADER assays (e.g., detection of two ormore different targets, alleles, etc.) the assays are performed togetherand they can have the same slope and intercept, or they can have knowndifferences in slope and intercept. In certain preferred embodiments,the two assays are configured to have the same slope and intercept.

In determining the relative proportions of variants in a mixture, thesum of the signals (e.g., the signal from methylated DNA, plus thesignal from unmethylated DNA of a gene or locus) necessarily representsthe total (100%) of the target DNA in the test sample. One can thereforecalculate the percent methylation by comparing the two signals to eachother, without the need to determine an absolute concentration of thetarget nucleic acids. The ratio of signals can be determined by thefollowing calculation

Ratio = 10^(((C_(p)m − C_(p)n m + C_(p)off)/−S))where C_(p)m is the crossing threshold/crossing point for the methylatedtarget, C_(p)nm is the crossing point for the non-methylated target andC_(p)off is the offset Cp between the two regression lines of C_(p)nm orC_(p)m vs. the log concentration of non-methylated or methylated target,respectively. If the lines are superimposable, then C_(p)off is 0. S isthe slope of the linear regression for C_(p) vs log copy number for bothtargets. As noted above, for a slope in the range 2.7 to 3.3, using 3will be acceptable and have little effect on the result.

The same equation can be used for analysis of any variation in a nucleicacid population, e.g., mutation, polymorphism, allele copy number, etc.The general version of the equation may be presented as follows:

Ratio(R) = 10^(((C_(p)v − C_(p)nv + C_(p)off)/−S))

where “v” and “nv” represent variant and nonvariant components of thesample.

Once the ratio has been calculated, then the fraction of, or percentmethylation can be calculated as follows% methylation=100×(R/(R+1))where R is the Ratio. The absolute copy number is not required tocalculate the percent methylation.

In some embodiments, a reference gene, e.g., a gene known to be presentin an invariant number of copies per genome, such as actin, is used. Ifa reference gene is used and both the reference gene and the methylatedgene are single copy genes, it is possible to use only the ratioequation to arrive at % methylation:R*100=% methylation

Exemplary data comparing the percent methylation calculated for avimentin gene compared to the percent methylation predicted using themethods described above

In this embodiment, the reference gene is present in the same number ofcopies as the sum of the methylated and non-methylated copies of thetest gene.

The methods can also be used to calculate percent mutation (orpolymorphism) in a target gene. For example, KRAS mutations can be usedto identify colon cancer and adenoma. By determining the ratio of thecopies containing mutation to either the wild type sequence or to areference gene having the same copy number, an estimate of % mutationcan be obtained and used to differentiate normal DNA samples from cancerDNA samples.

Although the disclosure herein refers to certain illustratedembodiments, it is to be understood that these embodiments are presentedby way of example and not by way of limitation.

EXAMPLES Measurement of Methylated and Unmethylated Vimentin GeneSequences

Embodiments of the present invention are used to quantitate themethylated CpG sequences of vimentin (VIM) in the presence ofunmethylated VIM sequence. In order to simulate the methylated andunmethylated genomic DNA, plasmids may be prepared and cloned to matchthe sequence that results following the bisulfate reaction conversion ofunmethylated C to U, which behaves as if it were a T in the PCR process.The methylated version of the sequence uses a plasmid with the CpG motifintact and the unmethylated representative plasmid replaces this with aTpG motif.

In this example, 3 CpGs are designed on each primer of the vimentinmethylation detection assay, with one at the 3′ end of the forwardprimer. In this assay, the forward primer is also the invasiveoligonucleotide. There are also CpG motifs located at the cleavage pointof the flap probe, in both senses. The assay is then used to detectmethylated copies spiked in unmethylated copies at different levels.

A target sequence of the plasmid representing the methylated sequence isas follows, with every C base corresponding to a methyl C for ananalogous genomic DNA:

(SEQ ID NO.: 1) 5′TCGTGTTTTCGTTTTTTTATCGTAGGATGTTCGGCGGTTCGGGTATCGCGAGTCGGTCGAGTTTTAGTCGGAGTTACGTGATTACGTTTATTCG TATTTATAGTTTGGGCGACG 3′

An exemplary assay employs a forward primer 5′-GGCGGTTCGGGTATCG-3′ (SEQID NO.:2), a reverse primer 5′-CGTAATCACGTAACTCCGACT-3′ (SEQ ID NO.:3),and a flap probe 5′-GACGCGGAGGCGAGTCGGTCG-3′/3C6 (SEQ ID NO.:4) wherethe area corresponding to methylated bases is shown underlined and the3′-end is blocked, e.g., with a hexanediol or amino group in order toinhibit primer extension. The first nine bases of the flap probe in thisexample are the region cleaved away by the flap endonuclease, and thatthen bind to a FRET cassette. Primers and flap probes are generallysupplied as non-catalog items by Integrated DNA Technologies (IDT,Coralville, Iowa).

A FRET cassette usable with these primer and probe oligonucleotides is5′-FAM/TCT/Quencher/AGCCGGTTTTCCGGCT GAGACTCCGCGTCCGT-3′/3C6 (SEQ IDNO.:5), where FAM is fluorescein, the quencher is the ECLIPSE DarkQuencher, and the 3′-end is blocked, e.g., with a hexanediol group.

Exemplary cycling conditions are 95° C. for 2 min; 45 cycles at 95° C.for 20 sec, 53° C. for 1 min; and 40° C. to hold. Fluorescent signalacquisition is done, e.g., at the 53° C. point in the cycle. The PCRreactions may be done, e.g., in LightCycler 480 Multiwell 96 Plates(Roche, Indianapolis) in 10 mM MOPS pH 7.5, with 7.5 mM MgCl₂, and 250μM dNTPs (Promega, Madison, Wis.). In some embodiments, Taq polymeraseis the HotStart GoTaq enzyme (Promega, Madison, Wis.) and the cleavageenzyme was Cleavase 2.0 (Hologic, Inc., Madison, Wis.). In someembodiments. forward primer concentration is 500 nM, reverse primerconcentration is 500 nM, flap probe is at 500 nM, and the FRET cassetteis used at a final concentration of 200 nM. Amplification and detectionmay be performed, e.g., in the LightCycler 480 optical thermocycler(Roche, Indianapolis, Ind.). The Cp is calculated as being the point atwhich fluorescence rises to 18% of the maximum fluorescence.

All publications and patents mentioned in the above specification areherein incorporated by reference in their entirety for all purposes.Various modifications and variations of the described compositions,methods, and uses of the technology will be apparent to those skilled inthe art without departing from the scope and spirit of the technology asdescribed. Although the technology has been described in connection withspecific exemplary embodiments, it should be understood that theinvention as claimed should not be unduly limited to such specificembodiments. Indeed, various modifications of the described modes forcarrying out the invention that are obvious to those skilled inpharmacology, biochemistry, medical science, or related fields areintended to be within the scope of the following claims.

I claim:
 1. A method of analyzing a sample comprising a nucleic acidtarget, the method comprising: a) providing a sample comprising nucleicacid target, wherein said nucleic acid target comprises at least onecopy of a non-variant form of said nucleic acid target and/or at leastone copy of a variant form of said nucleic acid target, b) providingreagents for an amplification reaction comprising a polymerase chainreaction and an invasive cleavage detection assay, wherein said reagentscomprise concentrations of oligonucleotides comprising forward primersand reverse primers complementary to portions of said variant andnon-variant forms of said nucleic acid target, FRET cassettes, and flapprobes, said flap probes comprising a portion complementary to a regionof a nucleic acid target and a flap portion complementary to a region ofa FRET cassette, wherein the concentrations of said oligonucleotides insaid reagents produce quantitative amplification data in saidamplification reaction from said variant form and said non-variant formof nucleic acid target, said data having the same known slope S for eachform, wherein S=the slope of the linear regression of reactionefficiency; and having previously determined C_(p)0 and C_(p)off values,wherein C_(p)0 is the intercept of the y axis when log copy number=0 andC_(p)off is an offset between regression lines of crossing thresholds ofquantitative amplification data from standard curves for said variantand non-variant forms of nucleic acid target; wherein in saidamplification reaction, non-variant and/or variant nucleic acid targetis amplified, and wherein amplified nucleic acid is detected by invasivecleavage to produce quantitative amplification data from saidnon-variant form and said variant form of said nucleic acid target, andwherein said quantitative amplification data correlates to amplificationcycle numbers for said non-variant and variant forms of said nucleicacid target present in said sample; c) amplifying said nucleic acidtarget using the reagents of step b) to produce quantitativeamplification data from said non-variant form and said variant form ofsaid nucleic acid target, wherein said amplifying comprises subjecting amixture of said nucleic acid target and the reagents of step b) to thefollowing thermal cycling conditions: a first set of 5-15 cycles of: i)a first temperature of at least 90° C.; ii) a second temperature in therange of 60° C. to 75° C.; iii) a third temperature in the range of 65°C. to 75° C.; followed by: a second set of 20-50 cycles of: i) a fourthtemperature of at least 90° C.; ii) a fifth temperature that is at least10° C. lower than said second temperature; iii) a sixth temperature inthe range of 65° C. to 75° C.; wherein no additional reagents are addedto said reaction between said first and second sets of cycles and, ineach cycle of said second set of cycles, cleavage of a flap probe ismeasured; d) determining a first crossing threshold indicative ofvariant copies of said nucleic acid target (C_(p)v); e) determining asecond crossing threshold indicative of non-variant copies of saidnucleic acid target (C_(p)nv); f) calculating a ratioR=10^((Cpv−Cpnv+Cpoff)/−S)), and g) calculating the percentage ofnucleic acid target present in said sample that is variant form, whereinthe percentage of nucleic acid target in said sample that is variantform is calculated as percent variant=100×(R/(R+1)).
 2. The method ofclaim 1, wherein the value used for S is greater than or equal to about2.7.
 3. The method of claim 1, wherein the value used for S is less thanor equal to about 3.3.
 4. The method of claim 1, wherein the value usedfor S is
 3. 5. The method of claim 1, wherein said variant nucleic acidis methylated DNA and said non-variant nucleic acid is unmethylated DNA.6. The method of claim 1, wherein said variant nucleic acid is nucleicacid containing a mutation and said non-variant nucleic acid is wildtype nucleic acid.